<html>
<head>
    <meta NAME="description" CONTENT="Peptide import-export in Marvin">
    <meta NAME="keywords" CONTENT="peptide, sequence, Java, Marvin">
    <meta NAME="author" CONTENT="Szilveszter Juhos">
    <link REL ="stylesheet" TYPE="text/css" HREF="../marvinmanuals.css" TITLE="Style">
    <title>Peptide import-export in Marvin</title>
</head>
<body>

<h1>Peptide import and export</h1>

<p>
    Codename: <strong>peptide</strong>
</p>
<h2>Contents</h2>
<ul>
    <li><a href="#peptide_general">Peptide sequence format</a></li>
    <li><a href="#import">Import options</a></li>
    <li><a href="#export">Export options</a></li>
    <li><a href="#custom_aas">Custom amino acids</a></li>

</ul>

<h2><a class="anchor" name="peptide_general">Peptide sequence format</a></h2>
<p>
Peptides can be entered using one or three letter amino acid abbreviations.
A text file containing sequences should contain only one type of sequence (only
one or only three lettered sequences but not both). Each line must have one
and only one continuous line in the text file without spaces.
Abbreviations used:
</p>
    <table align="left">
	<tr>
	    <td>Ala</td><td>Arg</td><td>Asn</td><td>Asp</td><td>Cys</td><td>Gln</td>
	    <td>Glu</td><td>Gly</td><td>His</td><td>Ile</td><td>Leu</td><td>Lys</td>
	    <td>Met</td><td>Phe</td><td>Pro</td><td>Ser</td><td>Thr</td><td>Try</td>
	    <td>Tyr</td><td>Val</td>
	</tr>
	<tr>
	    <td>A</td><td>R</td><td>N</td><td>D</td><td>C</td><td>Q</td><td>E</td>
	    <td>G</td><td>H</td><td>I</td><td>L</td><td>K</td><td>M</td><td>F</td>
	    <td>P</td><td>S</td><td>T</td><td>W</td><td>Y</td><td>V</td>
	</tr>
    </table>

<br><br>
<p>
    Valid files are like:
</p>

    <table bgcolor="#E4F1F1">
	<tr><td>PPPALPPKKR</td></tr>
	<tr><td>aptmppplpp</td></tr>
    </table>
<br>
    <table bgcolor="#E4F1F1">
	<tr><td>ProProProAlaLeuProProLysLysArg</td></tr>
	<tr><td>AlaProThrMetProProProLeuProPro</td></tr>
    </table>
<p>
    but these are incorrect:
</p>

    <table bgcolor="#E4F1F1">
	<tr><td>PPPALPPKKR</td></tr>
	<tr><td>AlaProThrMetProProProLeuProPro</td></tr>
    </table>
<br>
    <table bgcolor="#E4F1F1">
	<tr><td>ProProProAlaLeuProProLysLysArg</td></tr>
	<tr><td>AlaProThrMetPPPLPP</td></tr>
    </table>

<h2><a class="anchor" name="import">Import options</a></h2>
<blockquote>
    <table CELLSPACING=0 CELLPADDING=5 border="0">
	<tr VALIGN="TOP">
	    <td><strong>--peptide &lt;string&gt;</strong>
		&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
	    <td>The <strong>string</strong> is a valid one or three
		letter sequence. Example:
		<table>
		    <tr>
			<td><font face="monspaced">
			    molconvert --peptide FFKMLL mol -o peptide.mol
			</font></td>
			<td>will convert a one-letter sequence to a molfile</td>
		    </tr>
		</table>
	    </td></tr>
    </table>
</blockquote>
<h2><a class="anchor" name="export">Export options</a></h2>
<blockquote>
    <table CELLSPACING=0 CELLPADDING=5 border="0">
	<tr VALIGN="TOP">
	    <td><strong>peptide:3</strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
	    <td>Using this option the output will be a three-letter sequence.
		Examples:
		<table>
		    <tr>
			<td><font face="monspaced">
			    echo "[H]NCC(=O)NC(C)C(=O)NCC(O)=O" | molconvert peptide:3
			</font></td>
			<td>will convert SMILES representation to
			    a three-letter sequence</td>
                    </tr>
                    <tr>
                        <td><font face="monspaced">
                            molconvert --peptide GAG peptide:3
                        </font></td>
                        <td>will convert one-letter sequence to
                            a three-letter sequence</td>
                    </tr>
		</table>
	    </td></tr>
	<tr VALIGN="TOP">
	    <td><strong>peptide:1</strong>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td>
	    <td>One-letter peptide sequence option. Example:
		<table>
		    <tr>
			<td><font face="monspaced">
			    echo "[H]NCC(=O)NC(C)C(=O)NCC(O)=O" | molconvert peptide:1
			</font></td>
			<td>will convert the SMILES string to
			    a one-letter sequence</td>
		    </tr>
		</table>
	    </td></tr>
    </table>
</blockquote>
<h2><a class="anchor" name="custom_aas">Custom amino acids</a></h2>
<p>
    Apart from the essential amino acids that are already recognizable, it is
    possible to define custom amino acids with non-standard sidechains or with
    alternative protonation states. The usual format of the dictionary
    file is:
</p>
<pre>
	Ala	A	[CX4H3][C@HX4H1]([NX3])C=O						3	4
	Arg	R	[N;X3][C@@H]([CH2][CH2][CH2][N;H1X3][C;X3]([N;H2X3])=N)C=O		1	10
	Asn	N	[#7;X3][C@@H]([CH2]C([N;H2X3])=O)[C;X3]=O				1	7
	Asp	D	[NX3][C@@HH1]([CH2]C([OX2H1])=O)C=O					1	7
    ...
</pre>
    <br>
    where the corresponding columns are:<br>
    <ol>
	<li>long (three-letters code) abbreviation</li>
	<li>short (one-letter code) abbreviation</li>
	<li>SMARTS representation of the amino acid fragment</li>
	<li>the number of the backbone N in the SMARTS string (the third atom
	    for Ala in the first line of the example)</li>
	<li>the number of the backbone C next to the acyl oxygen (fourth atom
	    for Ala in the first line of example)</li>
    </ol>
<p>
    To create a custom amino acid abbreviation it is assumed that its name will
    start with <b>X</b> and some other letters will follow this character
    between parentheses. It is adviced to set this string for both the short
    and the long name of the custom amino acid. Valid lines are:</p>
    <pre>
	X(Hcy)		X(Hcy)		[SX2H1][CH2][CH2][C@HH1]([NX3])C=O		5	6
	X(1-foo)	X(1-foo)	[SX2H1][CH2][C@HH1]([NX3])C=O			4	5
	X(b)		X(b)		[CH3][CH2][CH2][CH2][CH2][C@HH1]([NX3])C=O	7	8
	...
    </pre>
<p>
    Note the SMARTS strings representing amino acid fragments are denoting the
    hydrogens and sometimes the connection numbers to avoid ambiguity. For
    example if only the C[C@H](N)C=O string is used for alanin, this would match
    for many other amino acids as well as some of them are "containing" alanin
    as a substructure. Users can store their custom amino acids in the
    <font face="monospaced">custom_aminoacids.dict</font> file in the
    <font face="monospaced">.chemaxon</font> directory (UNIX) or the user's
    <font face="monospaced">chemaxon</font> directory using MS Windows.
</p>
<blockquote>

</blockquote>
</body>
</html>
